!pip install matplotlib
3 Python essentials - Plotting data
Install dependencies
We will first have to make sure that the required libraries are installed.
How are python modules installed? Usually, through a package manager like pip
or conda
.
How do I find what line to run to install a specific package? You google the package and look at the install section in the documentation.
For example, if we want to install matplotlib
, we would search for matplotlib
on google, find https://matplotlib.org/stable/index.html and run the command pip install matplotlib
. Because this is a shell command, you have to run it with an !
.
Similar for seaborn
, you would google the library, find https://seaborn.pydata.org, look at installing, and run pip install seaborn
.
!pip install seaborn
matplotlib and seaborn
matplotlib
and seaborn
are both popular plotting libraries in Python.
matplotlib
is a low-level plotting library that allows you to create a wide variety of plots, including line plots, scatter plots, bar plots, histograms, and more. matplotlib has a lot of customization options, which can make it a bit more difficult to use than other plotting libraries, but it gives you the flexibility to create almost any kind of plot you need.
Here’s a simple example of how you could use matplotlib
to create a line plot:
import matplotlib.pyplot as plt
= [1, 2, 3, 4, 5]
x = [2, 4, 6, 8, 10]
y
plt.plot(x, y)"X Axis")
plt.xlabel("Y Axis")
plt.ylabel("Line Plot") plt.title(
seaborn
is a higher-level plotting library built on top of matplotlib that makes it easier to create beautiful and informative statistical plots. seaborn
has a lot of built-in functions for plotting commonly used statistical plots, such as violin plots, box plots, and heatmaps, which can save you a lot of time and make your plots look more professional.
Here’s a simple example of how you could use seaborn
to create a scatter plot:
import seaborn as sns
= [1, 2, 3, 4, 5]
x = [2, 4, 6, 8, 10]
y
=x, y=y)
sns.scatterplot(x"X Axis")
plt.xlabel("Y Axis")
plt.ylabel("Scatter Plot") plt.title(
Those are just simple plots, but you can check out seaborn
gallery for inspiration of more advanced plots.
Plotting the ESOL data
Let’s load the delaney data again. We can also do it directly from the URL.
import pandas as pd
= pd.read_csv("https://raw.githubusercontent.com/schwallergroup/ai4chem_course/main/notebooks/01%20-%20Basics/data/delaney-processed.csv" )
df
# prints summary statistics for each column df.describe()
Pandas directly lets us plot starting from the DataFrame.
# Plot 'measured log solubility in mols per litre' vs 'Molecular Weight'
='Molecular Weight', y='measured log solubility in mols per litre', kind='scatter') df.plot(x
# Plot a histogram of 'ESOL predicted log solubility in mols per litre'
'ESOL predicted log solubility in mols per litre'].plot(kind='hist') df[
# Scatter plot with regression line
='Molecular Weight', y='measured log solubility in mols per litre', data=df) sns.regplot(x
# Joint plot with histograms on the sides
='Number of Rings', y='Number of H-Bond Donors', data=df) sns.jointplot(x
# Box plot to show distribution of 'Polar Surface Area'
='Polar Surface Area', data=df) sns.boxplot(x
# Violin plot to show the distribution of 'Number of Rotatable Bonds'
='Number of Rotatable Bonds', data=df)
sns.violinplot(x
plt.show()
# Pair plot to visualize the relationship between multiple columns
'Number of H-Bond Donors', 'Molecular Weight', 'measured log solubility in mols per litre']])
sns.pairplot(df[[
plt.show()